Simplify bitmasks #375

calebzulawski · 2023-11-17T05:36:14Z

I was able to remove the ToBitMask and ToBitMaskArray traits and remove the complex trait bounds by making these two changes:

to_bitmask always returns u64. For larger vectors this truncates to 64 lanes.
to_bitmask_array is now to_bitmask_vector and returns Simd<T, N> for Mask<T, N>. This of course wastes bytes (in memory), but simplifies the function signature. This also makes it easier to do parallel operations on the entire byte array, which is a common use-case for bitmasks anyway.

programmerjake

I was able to remove the ToBitMask and ToBitMaskArray traits and remove the complex trait bounds by making these two changes:

to_bitmask always returns u64. For larger vectors this truncates to 64 lanes.

This is likely less efficient, since LLVM has to figure out that most of the 64-element vector is zero and thereby it can shrink the types...seems iffy.

to_bitmask_array is now to_bitmask_vector and returns Simd<T, N> for Mask<T, N>. This of course wastes bytes (in memory), but simplifies the function signature.

I don't think removing to_bitmask_array is a good idea, since it's much easier to use when you need the bitmask to directly inspect/modify. Also, it potentially wastes much less memory.

I'm ok with also having to_bitmask_vector...

This also makes it easier to do parallel operations on the entire byte array, which is a common use-case for bitmasks anyway.

seems less likely, since most the parallel operations you'd want are already covered by Mask...

crates/core_simd/src/swizzle.rs

thomcc · 2023-11-17T06:15:33Z

I haven't read through the code, but from the description, I'm very much in favor of this simplification!

This is likely less efficient, since LLVM has to figure out that most of the 64-element vector is zero and thereby it can shrink the types...seems iffy.

I think LLVM is generally very good at this sort of range optimization, FWIW. I may be misunderstanding though, and it might be worth checking?

calebzulawski · 2023-11-17T06:22:31Z

LLVM is generally good with zero-elements in vectors, but if we like this API I intend on modifying the simd_bitmask intrinsic to instead allow zero-extending the integer result. Padding the vector is just a workaround until that's done.

Regarding having access to the bytes, you can always call to_ne_bytes on the resulting vector, and copy it into an array to save memory. Maybe cumbersome for that specific usage but saves a lot on the API. ToBitMaskArray, with its associated type and bounds, is a bit much for a single function IMO, especially since most people will typically want the u64. This also isn't precluding an array implementation in the future, especially if const generics get better.

programmerjake · 2023-11-17T09:19:57Z

LLVM is generally good with zero-elements in vectors,

not in this case:
https://llvm.godbolt.org/z/b7fY99beo

calebzulawski · 2023-11-17T15:16:57Z

If we like the API I can update the intrinsic,

LLVM is generally good with zero-elements in vectors,

not in this case: https://llvm.godbolt.org/z/b7fY99beo

Ugh, I've definitely seen it do better than that. I added a new workaround that doesn't require padding the vectors.

crates/core_simd/src/masks/bitmask.rs

programmerjake

other than that, looks good enough for me.

programmerjake · 2023-11-19T05:37:21Z

GitHub apparently never posted my review comment that from/to_bitmask_vector needs more docs clarifying that all the bits are packed tightly in the first N bits, otherwise people might think that it uses one bit from each byte, like [bool] does.

calebzulawski · 2023-11-19T05:45:51Z

Oh, yeah, I'll add that.

calebzulawski requested review from thomcc, programmerjake and workingjubilee November 17, 2023 05:36

Simplify bitmasks

4ca9f04

calebzulawski force-pushed the bitmask branch from 75943f7 to 4ca9f04 Compare November 17, 2023 05:50

programmerjake requested changes Nov 17, 2023

View reviewed changes

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved

programmerjake self-requested a review November 17, 2023 09:31

Workaround simd_bitmask limitations

082e3c8

programmerjake reviewed Nov 17, 2023

View reviewed changes

crates/core_simd/src/masks/bitmask.rs Outdated Show resolved Hide resolved

calebzulawski mentioned this pull request Nov 17, 2023

Find first set element in a mask #376

Merged

Use u8xN for bitmasks

0ad68db

calebzulawski mentioned this pull request Nov 18, 2023

Add support for masked loads & stores #374

Closed

programmerjake approved these changes Nov 19, 2023

View reviewed changes

calebzulawski merged commit 7e5c03a into master Nov 19, 2023
66 checks passed

calebzulawski deleted the bitmask branch November 19, 2023 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify bitmasks #375

Simplify bitmasks #375

calebzulawski commented Nov 17, 2023 •

edited

Loading

programmerjake left a comment

thomcc commented Nov 17, 2023 •

edited

Loading

calebzulawski commented Nov 17, 2023

programmerjake commented Nov 17, 2023

calebzulawski commented Nov 17, 2023

programmerjake left a comment

programmerjake commented Nov 19, 2023

calebzulawski commented Nov 19, 2023

Simplify bitmasks #375

Simplify bitmasks #375

Conversation

calebzulawski commented Nov 17, 2023 • edited Loading

programmerjake left a comment

Choose a reason for hiding this comment

thomcc commented Nov 17, 2023 • edited Loading

calebzulawski commented Nov 17, 2023

programmerjake commented Nov 17, 2023

calebzulawski commented Nov 17, 2023

programmerjake left a comment

Choose a reason for hiding this comment

programmerjake commented Nov 19, 2023

calebzulawski commented Nov 19, 2023

calebzulawski commented Nov 17, 2023 •

edited

Loading

thomcc commented Nov 17, 2023 •

edited

Loading